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Archaeologists  are  increasingly  concerned  that  the  non-linear  relationship  between  the  calendric  and 
radiocarbon  timelines  may  introduce  anomalous  structures  into  radiocarbon-supported  temporal  fre¬ 
quency  distributions  (t/ds)  -  time  series  data  describing  temporal  fluctuations  in  the  frequency  of 
archaeological,  paleontological,  or  other  geological  deposits.  This  concern  emphasizes  a  need  for 
improved  middle  range  theory  on  tfd  formation,  addressing  the  interaction  between  several  stochastic 
processes.  This  paper  outlines  a  Monte  Carlo  simulation  designed  to  explore  the  influence  of  several 
variables  on  tfd  morphology,  including  the  nonlinear  calendric-to-radiocarbon  age  relationship.  The 
results  indicate  that  this  non-linear  relationship  entails  greater  variance  between  identically  generated 
t/ds  over  some  temporal  intervals  than  others  but  does  not  predictably  lead  to  tfd  peaks  over  these  in¬ 
tervals  as  previously  suggested.  Additional  variance  between  identically  generated  t/ds  results  from  small 
sample  sizes  and  high  values  in  the  underlying  TFD.  Smoothing  the  tfd  is  a  solution  not  only  to  calibration 
curve  interference  but  also  to  sample  size-dependent  sampling  error. 

©  2014  Elsevier  Ltd.  All  rights  reserved. 


1.  Introduction 

In  the  wake  of  Paul  Ehrlich's  The  Population  Bomb  (1968),  a 
dystopian  treatise  on  human  overpopulation,  demographers  and 
population  ecologists  have  paid  sustained  attention  to  the  popu¬ 
lation  growth  dynamics  of  our  species,  as  well  as  their  implications 
for  past,  present,  and  future  well-being  (Bloom,  2011;  Coale,  1974; 
Cohen,  1995;  Ellison  et  al,  2012:  757-759;  Gerland  et  al.,  2014; 
Hirschman,  2005;  Lee  and  Tuljapurkar,  2008;  Lee  et  al,  2009; 
Puleston  and  Tuljapurkar,  2008;  Rogers,  1992).  Both  population 
size  and  growth  factor  among  the  chief  concerns  of  demographic 
and  population  ecological  inquiry  (Chamberlain,  2006:  2,  2009: 
275—276;  Hassan,  1981:  1—2;  Micklin  and  Poston,  2005:  2;  Newell, 
1988:  3;  Preston  et  al.,  2001:  1-16;  Skalski  et  al.,  2005:  11),  and 
consequently  foundational  texts  in  these  fields  devote  considerable 


Abbreviations:  dTFA,  demographic  temporal  frequency  analysis;  LLN,  Law  of 
Large  Numbers;  MC,  Monte  Carlo;  MRT,  middle  range  theory;  tfd,  a  sample  tem¬ 
poral  frequency  distribution;  TFD,  a  theoretical  temporal  frequency  distribution, 
expressed  as  a  probability  distribution,  from  which  tfds  may  be  sampled;  spd, 
summed  probability  distribution. 
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attention  to  the  discussion  of  methods  for  describing  and  esti¬ 
mating  population  size  and  growth  (Newell,  1988:  8-9;  Preston 

et  al.,  2001:  1-16;  Skalski  et  al.,  2005:  11-15,  289-296);  for 
formalizing  the  connections  between  population  growth  and  other 
topics  of  demographic  concern  (e.g.,  population  composition  and 
the  age  pattern  of  vital  events;  Preston  et  al.,  2001:  138-190; 
Skalski  et  al.,  2005:  26-37);  and  for  fitting  population  growth 
models  to  empirical  data  and  projecting  such  models  into  hypo¬ 
thetical  futures  (Preston  et  al.,  2001:  117-137;  Skalski  et  al.,  2005: 
15-26,  297-322). 

Our  ability  to  identify  changes  in  the  size  of  past  populations 
is  relevant  to  a  diverse  range  of  research  topics  in  archaeological 
and  paleontological  research,  including  the  identification  of 
colonization  and  extirpation  processes,  factors  promoting  or 
undercutting  the  ecological  foothold  of  species,  and  causality  in 
sociopolitical  and  economic  change  and  in  cultural  diffusion  and 
transmission.  Over  the  last  four  decades,  this  need  has  increas¬ 
ingly  been  met  by  using  temporal  frequency  distributions  (t/ds)  - 
time  series  data  describing  temporal  variation  in  the  abundance 
of  archaeological  or  fossil  deposits  (Fig.  1)  -  as  proxy  census 
records,  or  at  minimum  as  records  of  occupation  intensity  or 
presence/absence  (see  Chamberlain,  2006:  131-132,  2009:  280). 
Most  regions  of  the  world  have  been  subjected  to  such 
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Fig.  1.  tfd  of  187  archaeological  site  occupations  from  the  Kodiak  Archipelago  in  the 
Gulf  of  Alaska,  expressed  as  a  histogram  (adapted  from  Maschner  et  al.,  2009b: 
Fig.  3.5).  The  blue  vertical  bar,  demarcating  the  timespan  850-750  cal  BP  (A.D. 
1100-1200),  is  a  century  purportedly  marked  by  a  dearth  of  archaeological  deposits  in 
Kodiak  (Maschner  et  al.,  2009b).  (For  interpretation  of  the  references  to  color  in  this 
figure  legend,  the  reader  is  referred  to  the  web  version  of  this  article.) 

demographic  temporal  frequency  analysis  (dTFA),  to  varying 
degrees  (Table  1). 

Research  questions  addressed  in  the  framework  of  dTFA  have 
varied  widely  in  temporal  scale,  focusing  on  both  long-run  popu¬ 
lation  processes  and  relatively  brief  episodes  of  demographic 
change.  Among  the  latter,  Maschner  and  colleagues  (2009b: 
38-41)  have  recently  applied  dTFA  to  the  evaluation  of  a  popula¬ 
tion  replacement  model  of  the  Kachemak-Koniag  transition  in  the 
Kodiak  Archipelago,  Gulf  of  Alaska,  approximately  900-700  cal  BP 
(Clark,  1998;  Fitzhugh,  2003;  Maschner  et  al.,  2009b;  Mills,  1994; 
Saltonstall  and  Steffian,  2006;  West,  2011).  This  cultural  change 
involved  the  appearance  of  ceramic  technology;  a  transition  from 
single-to  multi-room  pit  houses;  an  explosion  in  typical  site  size; 
intensified  usage  of  river  corridors  and  a  corresponding  increase  in 
the  freshwater  fishery;  changes  in  symbolic  expressions  and  mor¬ 
tuary  practices,  likely  relating  to  changes  in  social  and  belief  sys¬ 
tems  and  ritual  practices;  and  the  establishment  of 
institutionalized  sociopolitical  inequality.  A  number  of  archaeolo¬ 
gists  working  in  the  region  have  proposed  population  continuity 
across  this  cultural  transition  and  explained  it  as  a  process  of 
gradual  in  situ  change  (Jordan  and  Knecht,  1988;  Knecht,  1995; 
Saltonstall  and  Steffian,  2006),  perhaps  with  exogenous  influence 
from  a  small  number  of  immigrants  or  other  long-distance  contacts 
(Clark,  1988, 1992, 1998;  Fitzhugh,  2003).  In  contrast,  as  a  compo¬ 
nent  of  a  synthetic  ecological  model  of  cultural  and  demographic 
processes  spanning  the  Northeast  Pacific  Rim,  Maschner  et  al. 
(2009b)  have  argued  that  a  dTFA  of  archaeological  site  occupa¬ 
tions  from  Kodiak  supports  Dumond's  (1965,  1987,  1988,  1991, 
2005)  earlier  model  of  cultural  change  driven  by  demographic 
disruption  and  replacement,  in  which  Kachemak  populations 
largely  abandoned  the  region  and  were  subsequently  replaced  by 
Koniag  immigrants  (see  also  Hrdlicka,  1944).  The  most  striking 
feature  of  Maschner  and  colleagues'  dTFA  is  an  alleged  paucity  of 
sites  during  the  century  850-750  cal  BP,  though  a  closer  inspection 
of  their  tfds  (Fig.  1,  adapted  from  Maschner  et  al.,  2009b:  Fig.  3.5.) 
instead  suggests  that  this  paucity  came  the  following  century. 

Yet,  as  interest  in  dTFA  grows,  archaeologists  are  also  increas¬ 
ingly  aware  of  the  approach's  limitations  and  the  need  to  identify 


Table  1 

Archaeological  and  paleontological  publications  featuring  temporal  frequency 
distributions. 


Area 

References 

Africa 

Ozainne  et  al.,  2014 

Near  East 

Ammerman  et  al.,  1976 

Europe 

Adams  et  al.,  2001;  Armit  et  al.,  2013;  Collard  et  al.,  2010b; 
Crombe  and  Robinson,  in  press;  Gamble  et  al.,  2004,  2005; 
Gkiasta  et  al.,  2003;  Gonzalez-Samperiz  et  al.,  2009; 

Hinz  et  al.,  2012;  Housley  et  al.,  1997;  Kerr  and 

McCormick,  2014;  Liedgren  et  al.,  2007;  Martinez  et  al., 

1997;  Mellars  and  French,  2010;  Naudinot  et  al.,  in  press; 
Riede,  2008,  2009;  Schmidt  et  al.,  2012;  Shennan  and 
Edinborough,  2007;  Shennan  et  al.,  2013;  Tallavaara  and 

Seppa,  2012;  Tallavaara  et  al.,  2010;  Timpson  et  al.,  in  press; 
Turney  et  al.,  2006;  Whitehouse  et  al.,  2014;  Wicks  and 
Mithen,  2014;  Wicks  et  al.,  2014;  Woodbridge  et  al.,  2014 

Eastern  Asia 

Barton  et  al.,  2007;  Brantingham  et  al.,  2004;  Crema,  2012; 
Fiedel  and  Kuzmin,  2007;  Kim  and  Bae,  2010;  Kuzmin  and 
Keates,  2005;  MacPhee  et  al.,  2002;  Rhode  et  al.,  in  press; 
Surovell  et  al.,  2005;  Wang  et  al.,  2014 

North  America 

Adams  et  al.,  2001;  Albanese  and  Frison,  1995;  Anderson 
and  Freeburg,  2013;  Ballenger  and  Mabry,  2011;  Batt  and 
Pollard,  1996;  Bever,  2006;  Buchanan  et  al.,  2008,  2011; 

Collard  et  al.,  2008,  2010a;  Erlandson  et  al.,  1992,  2001; 

Esdale,  2008;  Fitzhugh,  2003;  Grier,  2006; 

Guthrie,  2006;  Haggarty 

et  al.,  1991;  Haynes,  1969;  Hutchison  and 

McMillan,  1997;  Kelly  et  al.,  2013;  Lepofsky  et  al., 

2005;  Louderback  et  al.,  2011;  Maschner  et  al.,  2009a, 

2009b;  Mason  et  al.,  2001;  Mullen,  2012;  Munoz  et  al., 

2010;  Peros  et  al.,  2010;  Potter,  2008;  Rasic  and 

Matheus,  2007;  Steele,  2010;  Story  and  Valastro,  1977; 

Surovell  et  al.,  2009; 

Taylor  et  al.,  2011;  Weninger  et  al.,  2011 

South  America 

Delgado  et  al.,  in  press;  Gil  et  al.,  2005; 

Rick,  1987;  Moreno  et  al.,  2009; 

Williams  et  al.,  2008 

Oceania 

Adams  et  al.,  2001;  Holdaway  and  Porch,  1995; 

Holdaway  et  al.,  2008;  Kirch,  1985;  Kirch  et  al.,  2012; 
McFadgen  et  al.,  1994;  Marwick,  2009;  Miller  et  al.,  1999; 
Mulrooney,  2013;  Smith  and  Ross,  2008; 

Smith  et  al.,  2008;  Turney  and  Hobbs,  2006; 

Williams,  2012;  Williams  et  al.,  2008,  2010,  2013 

the  conditions  and  constraints  on  its  valid  pursuit.  In  response, 
researchers  now  strive  for  a  more  explicit,  comprehensive,  and 
sufficiently  compelling  body  of  middle  range  theory  (MRT)  to 
support  and  regulate  dTFA,  one  that  outlines  the  system  of  trans¬ 
formations  through  which  paleopopulation  size  curves  pass  in  the 
process  of  becoming  archaeological  or  paleontological  tfds.  This 
effort,  outlined  in  Section  1.1,  has  been  underway  since  Rick's 
(1987)  article  on  the  population  dynamics  of  pre-ceramic  Peru 
and  will  likely  continue  for  some  time.  This  paper  focuses  on  one 
aspect  of  this  work,  the  unique  challenges  that  confront  dTFA 
supported  by  radiocarbon  (14C)  age  estimation  (Section  1.2). 

U.  The  MRT  of  dTFA 

The  validity  of  dTFA  rests  on  the  proposition  that  larger  pop¬ 
ulations  tend  to  discard  a  greater  abundance  of  materials  than  do 
smaller  ones  (Rick,  1987).  By  implication,  tfds  generated  from 
samples  of  time-stamped  archaeological  or  paleontological  de¬ 
posits  should  track  temporal  variation  in  past  population  size,  all 
else  being  equal. 

Of  course,  all  else  is  rarely  equal.  Rick  (1987)  qualified  this  strong 
assumption  almost  as  soon  as  he  articulated  it,  identifying  a  number 
of  potential  confounding  factors  each  warranting  dedicated  efforts 
at  mitigation  or  control.  In  the  two  and  a  half  decades  following 
Rick's  article,  disputed  interpretations  have  become  a  standard  in 
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dTFA,  one  of  the  most  recent  and  heated  arising  in  response  to 
Buchanan  and  colleagues'  (2008)  rejection  of  a  Paleoindian  popu¬ 
lation  response  to  the  hypothetical  Younger  Dryas  (YD)  extrater¬ 
restrial  impact  event  (see  Anderson  et  alM  2008;  Bamforth  and 
Grund,  2012;  Buchanan  et  al.,  2011;  Collard  et  al.,  2008;  Culleton, 
2008;  Firestone  et  al,  2007;  Jones,  2008;  Kennett  and  West,  2008; 
Kennett  et  al,  2008;  Steele,  2010;  Weninger  et  al.,  2011). 

General  factors  confounding  dTFA  can  be  divided  between  those 
that  introduce  systematic  error  into  the  population  size  estimate 
and  those  that  introduce  random  error  (Table  2). 


2.2.  Systematic  and  random  error  and  non-demographic  structures 
in  14C-supported  tfds 

Special  issues  in  dTFA  arise  when  archaeologists  or  paleontol¬ 
ogists  depend  on  14C  age-stamped  samples  for  their  analysis.  In  this 
case,  two  additional  factors  become  relevant:  secular  variation  in 
the  concentration  of  atmospheric  14C  (Sonett  and  Finney,  1990),  and 
instrument  measurement  error  (Walker,  2005),  each  entailing 
unique  computational  demands  for  tfd  construction  and  analysis. 
These  demands  have  increasingly  been  met  by  14C  age  calibration 
and  the  construction  of  summed  probability  distributions  (. spds ). 
With  this  type  of  tfd ,  the  probability  density  characterizing  each 
probabilistic  age  estimate  in  the  sample  is  summed  at  each  location 
along  the  timeline: 

U(t)  =  J2fi  (f)  0) 

1=1 

where  u(t)  is  the  sum  of  all  units'  probabilistic  age  estimates  at  time 
t  and  fft)  is  the  probability  density  for  the  i  th  data  point  in  the 
sample  at  time  t.  In  effect,  spds  provide  a  best  guess  regarding  the 
temporal  frequency  of  a  sample  of  units,  given  the  uncertainty  that 
characterizes  14C-based  age  estimation. 

Fig.  2  presents  a  spd  comprising  166  site  occupations  from 
Kodiak.  Data  were  compiled  by  the  present  author,  updating  pre¬ 
vious  databases  compiled  by  Haggarty,  Erlandson,  and  colleagues 
(Erlandson  et  al.,  1992;  Haggarty  et  al.,  1991),  Mills  (1994),  and 
Fitzhugh  (2003).  Dates  excluded  from  analysis  include  strictly 
geologic  dates;  modern  dates  and  dates  reported  without  lab  error; 
dates  lacking  sufficient  contextual  information  to  evaluate  their 
archaeological  validity;  dates  widely  dismissed  by  archaeologists 
working  in  the  region;  and  dates  on  marine  fauna  or  suspected 
marine  fauna.  For  each  site  in  the  sample,  clusters  of  insignificantly 
different  age  estimates  were  pooled  following  the  method  pro¬ 
posed  by  Ward  and  Wilson  (1978),  to  minimize  the  over¬ 
representation  of  intensively  dated  sites.  It  is  worth  noting  that 
Maschner  and  colleagues'  tfd  (Fig.  1)  resembles  the  spd  shown  in 
Fig.  2  in  broad  outline,  as  well  as  in  several  details,  and  particularly 
the  brief  paucity  of  sites  750-650  cal  BP. 

While  14C  age  calibration  and  the  use  of  spds  are  regarded  as 
best  practices  in  14C-supported  dTFA,  the  spds  they  generate  un¬ 
fortunately  exhibit  anomalous  peak-and-trough  structures  arising 
from  several  recognized  sources  of  error: 

1.  Sampling  error,  tfds ,  including  spds,  are  sample  distributions  and 

as  such  approximate  the  shape  of  their  underlying  probability 


1  In  the  dTFA  literature,  the  term  ‘cumulative  probability  distribution/function’ 
has  occasionally  been  used  in  place  of  ‘summed  probability  distribution/function.’ 
This  usage  is  unfavorable,  having  the  potential  to  be  confused  with  either  of  two 
standard  statistical  terms  —  ‘(cumulative)  density  function’  (df  or  cdf,  expressed  F(x) 
in  mathematical  notation)  and  ‘cumulative  probability  function’  —  referring  to  the 
probability  Pr(X  <  x )  for  continuous  and  discrete  variates,  respectively. 


Table  2 

A  typology  of  confounding  factors  in  dTFA. 


Rick's  (1987) 

typology  of 
error  in  dTFA 

Definition 

References 

Creation 

Systematic  error:  diachronic  change  in 

Peros  et  al.,  2010; 

the  prevailing  per  capita  deposition  rate 

Rick,  1987; 

Random  error:  stochastic  variation 
around  the  deposition  rate 
(a  Poisson  process) 

Taylor  et  al.,  2011 

Preservation 

Systematic  error:  (1)  time  transgressive 

Albanese  and 

site  destruction:  (2)  diachronic  change 

Frison,  1995; 

in  the  prevailing  rate  of  site  destruction 

Ballenger  and 

Random  error:  stochastic  variation 

Mabry,  2011; 

around  the  probability  of 

Clevis  et  al.,  2006; 

deposit  survival 

Mandel,  1995; 
Michczynska 
et  al.,  2003;  Rick, 

1987;  Surovell 
and  Brantingham, 

2007;  Surovell 
et  al.,  2009;  Waters 
and  Kuehn,  1996 

Investigation 

Systematic  error:  over-  or 

Ballenger  and 

underrepresentation  of  archaeological 

Mabry,  2011;  Batt 

deposits  from  certain  periods  of  time 

and  Pollard,  1996; 

resulting  from  (a)  the  use  of  particular 

Chamberlain,  2006, 

survey  methods,  or  (b)  heavy  research 

2009;  Dean,  1978; 

focus  on  particular  periods  of 

Faught,  2008; 

special  interest 

Flaynes  et  al.,  2007; 

Random  error:  (1)  Random 

Kennett  et  al.,  2008; 

sampling  error;  (2) 

Michczynska  et  al., 

failure  to  include  accurate 

2003;  Pettitt  et  al., 

dates  or  exclude 

2003;  Rick,  1987; 

spurious  ones  resulting  from  the 

Rieth  and  Hunt, 

implementation  of  imperfect  sample 

2008;  Schiffer, 

hygiene  protocols 

1987;  Shott,  1992; 
Surovell  and 
Brantingham, 

2007;  Waters 
and  Stafford, 

2007;  Williams,  2012 

00 

o 

o 

o 


cal  BP 


Fig.  2.  spd  of  166  site  occupations  from  the  Kodiak  Archipelago.  Ages  were  calibrated 
using  the  IntCall3  (Reimer  et  al.,  2013)  calibration  curve  and  script  written  by  the 
author  in  the  R  programming  language.  See  Supplemental  Material  for  data.  The  blue 
vertical  bar  is  as  in  Fig.  1.  (For  interpretation  of  the  references  to  color  in  this  figure 
legend,  the  reader  is  referred  to  the  web  version  of  this  article.) 
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tfd  as  histogram 
and  rug  plot 


cal  BP 


tfd  as  KDE 
and  rug  plot 


Q-Q  plot 


10000  6000  2000  0 
expected  sample 
(cal  BP) 


Fig.  3.  A:  tfd,  expressed  as  a  histogram  and  rug  plot,  describing  a  sample  of  200  observations  randomly  sampled  from  a  truncated  exponential  TFD  (light  blue  line).  B:  tfd,  expressed 
as  a  KDE  (green  line)  and  rug  plot,  describing  the  same  sample  of  200  observations  as  in  Fig.  3:A.  C:  Quantile-quantile  plot  comparing  the  observed  sample  from  Fig.  3:A  and  3:B  to 
the  expected  sample.  The  diagonal  line  describes  an  ideal  1:1  relationship.  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to  the  web  version 
of  this  article.) 


distribution  (TFD)  imperfectly.  Fig.  3  illustrates  the  morpholog¬ 
ical  disparities  standing  between  a  truncated  exponential  TFD 
and  a  tfd  of  200  observations  randomly  drawn  from  it.  Anom¬ 
alous  gaps  and  concentrations  between  observations  are 
evident  in  the  rug  plot  expression  of  the  tfd,  as  indicated  in  the 
quantile-quantile  plot  (Fig.  3:C),  informing  the  artificial  peaks, 
valleys,  and  plateaus  observed  in  both  the  histogram  (Fig.  3:A) 
and  kernel  density  estimate  (KDE;  Fig.  3:B).  The  influence  of 
sampling  error  on  tfd  shape  has  been  explored  by  Michczynska 
and  colleagues  (Michczynska  and  Pazdur,  2004;  Michczynska 
et  al.,  2003),  Williams  (2012),  Armit  et  al.  (2013),  Rhode  and 
colleagues  (in  press),  and  Contreras  and  Meadows  (in  press)  and 
is  also  apparent  in  Bartlein  and  colleagues'  (1995:  Fig.  2)  un¬ 
calibrated  tfd  simulations. 

2.  Dispersive  calendric-to-uC  age  mapping.  Each  calendric  age  cor¬ 
responds  with  a  range  of  possible  14C  ages  rather  than  a  single 
point.  For  example,  the  calendric  age  12,000  cal  BP  corresponds 
with  a  range  of  14C  ages  specified  as  N(  10,237,  47)  by  IntCall3 
(Reimer  et  al.,  2013).  Figs.  4:A  and  5:  A  illustrate  the  disparities 
that  arise  between  14C  ages  having  an  identical  calendric  age  of 
12,000  cal  BP. 

3.  Instrument  measurement  error.  As  is  true  in  the  measurement 
sciences  in  general,  the  instruments  employed  in  14C  age 
measurement  entail  random  error  that  obscures  the  true  14C 
age  of  dated  specimens.  As  a  result,  even  repeated 


measurements  on  a  single  specimen  may  produce  disparate 
estimates.  This  instrument  measurement  error  interacts 
with  dispersive  calendric-to-14C  age  mapping  to  further 
extend  the  range  of  14C  ages  corresponding  with  any  given 
calendric  age,  leading  to  a  propagation  of  error.  In  Fig.  4:A,  the 
true  14C  ages  of  all  three  simulated  observations  have  each 
been  offset  by  a  value  following  a  normal  distribution  speci¬ 
fied  as  N( 0,  50).  Fig.  5:B  shows  the  distribution  of  10,000 
measured  14C  ages  corresponding  with  the  true  14C  ages 
shown  in  Fig.  5: A,  robustly  illustrating  the  pattern  that 
emerges  from  this  propagation  of  error.  Importantly,  when 
the  three  age  estimates  shown  in  Fig.  4:A  are  calibrated 
(Fig.  4:B),  it  becomes  clear  that  the  probabilistic  calendric  age 
estimates  from  which  spds  are  generated  can  be  profoundly 
dislocated  by  this  propagation  of  error.  Bamforth  and  Grund 
(2012),  Armit  et  al.  (2013),  Kerr  and  McCormick  (2014), 
Wicks  and  Mithen  (2014),  and  Contreras  and  Meadows  (in 
press)  have  incorporated  this  propagation  of  error  into  their 
simulation  studies  using  OxCal's  R_simulate  function  (Bronk 
Ramsey,  2009a, b).  In  fact,  this  effect  accounts  entirely  for  the 
differences  observed  between  two  identically  generated  spds 
in  Bamforth  and  Grund's  (2012:  Fig.  1)  study  in  the  absence  of 
sampling  error.  McFadgen  et  al.  (1994)  have  also  explored  the 
influence  of  instrument  measurement  error  on  tfd 
morphology. 
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Fig.  4.  A\  Dispersive  calendric-to-14C  age  mapping  for  three  simulated  observations  having  an  identical  true  calendric  age  of  12,000  cal  BP,  showing  additional  dispersion  through 
instrument  measurement  error.  Dispersive  mapping  for  the  true  age  12,000  cal  BP  follows  the  distribution  rcyBP~N(i±  =  10,237,  a  =  47)  per  IntCall3  (Reimer  et  al.,  2013),  with  an 
instrument  measurement  offset  following  a  normal  distribution  specified  as  N(g  =  0,  a  =  50).  B:  calibrated  probability  distributions  for  the  three  14C  age  estimates  presented  in 
Fig.  4: A,  each  assuming  a  standard  error  of  50  14C  years. 


4.  Calibration  interference.  Calibration  as  it  is  currently  imple¬ 
mented  involves  mapping  the  entire  probability  distribution 
associated  with  a  14C  age  estimate  through  the  calibration  curve 
into  calendric  time.  In  the  process,  the  shape,  location,  and  scale 
of  the  distribution  are  transformed.  Because  the  shape  of  the 
calibration  curve  varies  over  time,  the  degree  to  which  these 
three  distributional  properties  are  transformed  likewise  varies 
along  the  timeline  (Guilderson  et  al.,  2005;  Pazdur  and 
Michczynska,  1989;  Steier  et  al.,  2001;  Stuiver  and  Reimer, 
1989,  1993;  Weninger,  1986;  Weninger  et  al.,  2011).  Fig.  6  il¬ 
lustrates  this  time-variant  effect.  While  the  shape  and  scale  of 
two  probability  distributions  is  identical  along  the  14C  timeline, 
their  respective  expressions  along  the  calendric  timeline  is 
notably  disparate:  the  younger  calendric  estimate  is  more  pre¬ 
cise  than  its  corresponding,  uncalibrated  counterpart  and  is  also 
asymmetrical,  whereas  the  older  calendric  estimate  is  not  only 
more  diffuse  than  its  uncalibrated  counterpart  but  also  shows 
multiple,  asymmetrically  distributed  modes.  The  influence  of 
this  error  on  the  shape  of  14C-supported  tfds  has  been  explored 
by  McFadgen  et  al.  (1994),  Michczynska  and  colleagues 
(Michczynska  and  Pazdur,  2004;  Michczynska  et  al.,  2003), 
Williams  (2012),  Bamforth  and  Grund  (2012),  Armit  et  al.  (2013), 
Kerr  and  McCormick  (2014),  Wicks  and  Mithen  (2014),  and 
Contreras  and  Meadows  (in  press). 

In  effect,  these  four  sources  of  error  entail  a  potential  to  intro¬ 
duce  structural  equifinalities  into  the  shape  of  the  tfd,  with  some 
structures  representing  real  demographic  processes  and  others 
false  ones.  At  the  same  time,  the  appearance  of  real  demographic 
structures  may  be  dampened  by  both  systematic  and  random  error. 

In  response  to  this  potential  for  error,  a  handful  of  simulation  ex¬ 
periments  have  been  conducted  to  explore  the  influence  of  one 


(Bartlein  et  al.,  1995;  Rhode  et  al.,  in  press),  two  (McFadgen  et  al.,  1994; 
Michczynska  and  Pazdur,  2004;  Williams,  2012),  or  three  (Bamforth 
and  Grund,  2012;  Kerr  and  McCormick,  2014;  Wicks  and  Mithen, 

2014)  of  these  sources  of  error.  However,  exploring  the  interaction 
of  all  four  factors  has  only  recently  begun  (Armit  et  al.,  2013 ;  Contreras 
and  Meadows,  in  press).  In  addition,  the  robustness  of  these  experi¬ 
ments  has  for  the  most  part  been  limited  by  unfavorably  small 
simulation  sizes,  involving  either  the  characterization  of  a  single 
simulation  run  (Armit  etal.,  2013;  Bamforth  and  Grund,  2012;  Bartlein 
et  al.,  1995;  Contreras  and  Meadows,  in  press;  Kerr  and  McCormick, 
2014;  McFadgen  et  al.,  1994;  Wicks  and  Mithen,  2014)  or  compari¬ 
sons  of  between  two  and  one  hundred  runs  (two:  Bamforth  and 
Grund,  2012;  five:  Contreras  and  Meadows,  in  press;  ten:  Armit 
et  al.,  2013;  thirty:  Williams,  2012;  one  hundred:  Rhode  et  al.,  in 
press).  Michczynska  and  colleagues'  Monte  Carlo  (MC)  simulations, 
based  on  10,000  runs,  are  the  noteworthy  exceptions  (Michczynska 
and  Pazdur,  2004;  Michczynska  et  al.,  2003). 

The  next  section  presents  a  general  framework  for  a  flexible  MC 
simulation  designed  to  address  these  critical  gaps.  It  begins  by 
considering  the  motivation  for  and  properties  of  effective  MC 
simulations.  Section  3  presents  the  results  of  two  permutations  of 
this  approach,  highlighting  the  influence  of  size-dependent  sam¬ 
pling  error  and  TFD  shape  on  inter-run  tfd  variability. 


2.  A  MC  simulation  approach 

2.2.  Motivation ,  goals,  and  ideal  properties  of  MC  simulations 

Frequently,  modeling  complex  systems  of  causal  relationships 
eludes  closed-form  expression,  inhibiting  the  possibility  of  pre¬ 
dicting  their  outcomes  through  analytical  solution  (Roberts  and 
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solution.  Such  statistics  are  labeled  ‘plug-in  estimators/  to 
distinguish  them  from  both  their  analytic  counterparts  and  or¬ 
dinary  sample  statistics.  Plug-in  estimators  may  include  single¬ 
point  values  (e.g.,  mean,  median,  variance,  percentiles)  or  entire 
sample  distributions  (e.g.,  histograms,  KDEs). 

With  large  enough  sample  sizes,  MC  plug-in  estimators  take 
advantage  of  the  Law  of  Large  Numbers  (LLN),  which  states  that 
sample  statistics  should  converge  toward  their  underlying  param¬ 
eters  as  sample  size  increases  (Christian  and  Casella,  2010).  In 
practice,  the  efficiency  with  which  computers  can  generate  random 
numbers  allows  for  the  implementation  of  incredibly  large  simu¬ 
lations  and  consequently  very  precise  plug-in  estimators. 

However,  when  sample  size  is  itself  a  parameter  of  interest  in  the 
model  system,  as  it  is  when  sampling  error  exerts  a  major  influence  on 
the  shape  of  a  sample  distribution,  variation  reduction  is  no  longer  the 
goal  of  the  simulation.  Rather,  exploring  the  influence  of  sample  size 
on  variation  between  samples  becomes  an  analytical  target.  Even  in 
this  case,  however,  achieving  a  robust  and  precise  estimate  of  varia¬ 
tion  between  samples  requires  the  simulation  of  a  large  set  of  sam¬ 
ples,  warranting  a  two-step  elaboration  on  the  basic  MC  framework: 


Fig.  5.  A:  Distribution  of  true  14C  ages  for  10,000  simulated  observations  having  an 
identical  true  calendric  age  of  12,000  cal  BP.  These  observations  follow  the  normal 
model  specified  as  rcyBPtrue~N(/i  =  10,237,  a  =  47)  (light  blue  line),  based  on  IntCall3 
(Reimer  et  al.,  2013).  The  sample  mean  and  standard  deviation  are  10,237.42  and  47.64, 
respectively.  A  KDE  of  this  sample  (green  line)  closely  approximates  the  model  dis¬ 
tribution.  B:  Distribution  of  measured  14C  ages  for  the  same  10,000  observations  as  in 
Fig.  5: A.  Simulated  measurement  error  follows  the  normal  model  specified  as 
offset~N(/u  =  0,  a  =  50).  Propagation  of  error  implies  a  normal  model  specified  as 
rcyBPmeasmed  ~  N(fi  =  10237,  a  =  \] 472  +  502  =  68.62)  (light  blue  line).  The  sample 
mean  and  standard  deviation  are  10,238.27  and  69.75,  respectively.  A  KDE  of  the 
sample  (green  line)  once  again  closely  approximates  the  model  distribution.  (For 
interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to 
the  web  version  of  this  article.) 


1.  Generate  a  single  run  of  random  inputs,  manipulating  sample 
size  rather  than  favoring  a  large  one,  and  calculate  the  plug-in 
estimator(s)  of  interest  from  its  output; 

2.  Repeat  the  first  step  a  large  number  of  times,  collecting  plug-in 
estimator(s)  of  interest  from  each  run,  then  calculate  second- 
order,  inter-run  estimators  from  the  collection  of  per-run  esti¬ 
mators.  This  is  the  MC  equivalent  to  solving  for  the  parameters 
of  an  unknown  sampling  distribution. 

2.2.  Simulating  variation  between  UC  spds 


Casella,  2004;  Rubinstein  and  Kroese,  2008;  Thomopoulos,  2013). 

This  is  true,  for  example,  when  the  output  follows  an  unknown 
probability  distribution  that  combines  the  effects  of  separate  but 
interacting  stochastic  processes,  each  governed  by  its  own  data 
generating  process  (DGP).  In  the  context  of  14C-supported  dTFA, 
exploring  the  combined  influence  of  three  stochastic  processes  - 
random  sampling,  calendric-to-14C-age  mapping,  and  instrument 
measurement  error  -  on  the  shape  of  spds  provides  a  case  in  point. 
This  compound  DGP,  in  combination  with  the  complicated  trans¬ 
formation  entailed  by  calibration,  render  any  effort  to  infer  the  spd 
expression  of  a  known  TFD  analytically  intractable. 

When  analytical  solutions  cannot  be  obtained  for  problems  such 
as  this,  MC  simulation  offers  a  means  of  approximating  the  solution 
within  a  reasonable  degree  of  uncertainty.  Customarily,  MC  simu¬ 
lations  conform  to  the  following,  general  procedure: 

1.  Design  an  algorithm  that 

a.  automates  the  passage  of  output  from  earlier  components  of 
the  system  to  later  ones; 

b.  dictates  realistic  functional  relationships,  whether  deter¬ 
ministic  or  probabilistic,  between  each  component's  input 
and  output  variables; 

2.  Hold  certain  variables  in  the  model  system  constant  while 
allowing  others  to  vary  within  meaningful  limits; 

3.  Initialize  the  simulation  by  generating  a  large  set  of  random 
observations  and  passing  them  through  the  algorithm, 
recording  the  final  output  for  each  pass  (as  well  as  intermediate 
outcomes,  if  of  interest); 

4.  Calculate  one  or  more  summary  statistics  for  the  set  of  outcomes 
and  treat  these  as  approximations  for  the  elusive  analytical 


Past  simulation  studies  exploring  14C  spds  have  emphasized  the 
influence  of  several  variables  on  spd  morphology.  These  include  the 
shape,  scale,  and  location  of  the  underlying  TFD  (assuming  for  the 
sake  of  argument  that  this  is  proportional  to  the  population  size 
curve,  N(t));  sample  size  in  a  random  sampling  framework;  the 
shape  of  the  14C-to-calendric-age  curve  (i.e.,  the  calibration  curve), 
used  for  both  calendric-to-14C-age  translation  and  subsequent 
calibration;  and  the  magnitude  of  instrument  measurement  error 
(Armit  et  al.,  2013;  Bamforth  and  Grund,  2012;  Bartlein  et  al.,  1995; 
Contreras  and  Meadows,  in  press;  Kerr  and  McCormick,  2014; 
McFadgen  et  al.,  1994;  Michczynska  and  Pazdur,  2004;  Rhode 
et  al.,  in  press;  Wicks  and  Mithen,  2014;  Williams,  2012). 

The  main  constraint  on  most  of  these  past  simulations'  ability  to 
achieve  robust  estimators  of  inter-run  variance  has  been  a  lack  of 
automation,  often  requiring  the  researchers  themselves  to  transfer 
output  from  one  computing  environment  to  another,  as  well  as 
performing  some  of  the  simulation  steps  by  hand.  For  example, 
Bamforth  and  Grund  (2012)  recorded  the  true  ages  of  their  simu¬ 
lated  samples  in  an  unspecified  environment,  then  passed  these  to 
OxCal  3.10  and  4.1  to  translate  them  into  offset  14C  age  estimates, 
and  finally  passed  these  to  CALIB  5.1  and  6.0  to  generate  spds. 
Similarly,  Williams  (2012)  resampled  from  an  extensive  database  of 
archaeological  14C  dates  in  an  unspecified  environment  and  gener¬ 
ated  one  spd  per  run  in  OxCal  4.1,  for  210  separate  runs.  Contreras 
and  Meadows  (in  press)  digitized  realistic  TFDs  with  Plot  Digitizer, 
generated  random  samples  of  varying  size  from  these  using  an  al¬ 
gorithm  in  R,  and  passed  these  to  OxCal  4.2.3  to  translate  them  into 
offset  14C  age  estimates  and  calibrate  them.  This  constraint  has  been 
due  in  large  part  to  archaeologists'  dependence  on  one  of  the  major 
calibration  programs  -  OxCal,  CALIB,  or  CalPal  -  to  generate  14C 
spds.  Fortunately,  such  dependence  is  not  compulsory,  as  Shennan 
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Fig.  6.  Two  hypothetical  14C  age  estimates  with  identical  precisions  on  the  14C  age  axis  (vertical):  475  ±  30  rcyBP  (blue  curve)  and  875  ±  30  rcyBP  (green  curve).  IntCall3  (Reimer 
et  al.,  2013)  is  shown  in  the  main  panel  (solid  black  lines),  along  with  an  ideal  1:1  relationship  between  calendric  and  14C  time  (dashed  line).  The  varying  slope  of  the  calibration 
curve  mediates  the  mapping  relationship  between  14C  and  calendric  age  estimates  (horizontal  axis).  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is 
referred  to  the  web  version  of  this  article.) 


et  al.  (2013)  have  demonstrated,  and  the  potential  power  that  we 
might  hope  to  achieve  by  conducting  simulation  studies  of  spd 
formation  in  a  single  computing  environment  is  well-illustrated  by 
Michczynska  and  colleagues'  [2004]  unparalleled  10,000-run 
simulation  in  GdCALIB.  The  free  availability  of  the  R  programming 
language  now  puts  the  potential  for  fully  automated,  single¬ 
environment  simulations  at  the  fingertips  of  archaeologists.  Such 
well-integrated  simulation  studies  would  take  the  researcher  out  of 
the  process  save  for  in  choosing  how  to  manipulate  the  parameters 
of  interest.  This  could  result  in  a  vast  improvement  in  simulation 
efficiency  and  a  significant  reduction  in  user  error. 

The  following  presents  the  general  framework  of  a  MC  algo¬ 
rithm,  implemented  in  R  and  designed  to  be  flexible  enough  to 
accommodate  the  experimental  manipulation  of  those  parameters 
listed  above.  In  general,  the  DGP  underlying  this  algorithm  com¬ 
bines  the  following  elements  : 

•  A  set  of  runs  is  generated,  each  run  comprising  a  sample  of 
observations  randomly  drawn  from  an  identical  TFD.  Each 
observation  simulates  the  true  calendric  age  of  a  single  unit  of 
observation  (as  in  Fig.  3): 

calBP true  i  n  ~  TFD(O)  (2) 


2  Simulation  code  in  the  R  programming  language  is  available  at  https://github. 
com/brownwuw/radiocarbon_spd_DGP_MC_simulation_code/blob/master/MCsim. 
functions. 


where  the  TFD  is  specified  as  a  function  of  a  vector  of  parameters  6. 

•  Each  true  calendric  age  is  mapped  into  14C  time  according  to  the 
relationship 

RCYBP  true  \  n  ~  N([acc(cciIBP) ,  gqc (cqIBP))  (3) 

where  pcdcalBP)  is  the  mean  14C  age  corresponding  with  the  true 
calendric  age  (equivalent  to  r(tz)  in  Bronk  Ramsey,  2009b:  Eq.  22) 
and  crcc(ca/BP)  is  the  accompanying  standard  deviation  (equivalent 
to  s(ti)  in  Bronk  Ramsey,  2009b:  Eq.  22),  as  dictated  by  a  specified 
calibration  curve. 

•  True  14C  ages  are  offset  to  simulate  instrument  measurement 
error  ( IME ,  equivalent  to  Sj  in  Bronk  Ramsey,  2009b:  Eq.  (22)) 
according  to  the  dispersive  relationship 

RCYBP measured,  ~  N (RCYBPtrue  1  (4) 

( RCYBPmeasuredti  being  equivalent  to  rz  in  Bronk  Ramsey,  2009b: 
Eq.  (22)).  The  combined  effect  of  this  and  the  preceding  step  is  a 
propagation  of  error  as  discussed  in  Section  1.1,  such  that. 

RCYBP measured A , . . . ,n  ~  N  (^icc(calBP) ,  yj aCc(calBP)2  +  IME2^j 

(5) 

whose  probability  density  follows  the  posterior  probability  func¬ 
tion  given  by  Bronk  Ramsey  (2009b:  Eq.  22). 
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•  Measured  14C  ages  are  calibrated  according  to  an  algorithm 
similar  to  that  used  by  OxCal,  CALIB,  and  most  recently  Shennan 
et  al.  (2013).  (For  debate  regarding  the  merits  of  different  cali¬ 
bration  algorithms,  see  Buchanan  et  al.,  2011;  Culleton,  2008; 
Steele,  2010;  Weninger  et  al.,  2011.)  Once  calibrated,  the  prob¬ 
ability  distributions  for  all  observations  in  a  single  run  are  then 
summed  per  Eq.  (1 ),  then  rescaled  so  that  the  area  under  the  spd 
sums  to  1. 

•  Plug-in  estimators  are  calculated  summarizing  the  distribu¬ 
tional  properties  of  all  runs'  spd  values  at  five-year  intervals 
along  the  calendric  timeline. 

To  demonstrate  the  utility  and  flexibility  of  this  MC  framework,  I 
focus  here  on  an  experiment  exploring  the  combined  influence  of 
calibration  interference,  sampling  error,  and  TFD  shape  on  spd 
morphology.  The  eight  sample  sizes  considered  (n  =  30,  50,  100, 
200,  500, 1000, 1500,  and  2000)  span  a  range  typical  for  studies  in 
dTFA.  The  two  TFDs  considered  (a  uniform  and  an  exponentially 
increasing  TFD)  are  both  truncated  at  7500  and  0  cal  BP  to  address 
the  range  of  time  covered  by  Kodiak's  known  occupation  history 
(see  Figs.  1  and  2).  These  TFD  morphologies  simulate  a  stationary 
population  and  a  stable  population  growing  at  a  constant  rate  of 
approximately  0.7064  people  per  thousand  people  per  year, 
respectively.  As  previous  researchers  have  observed  (Bartlein  et  al., 
1995;  McFadgen  et  al.,  1994;  Bamforth  and  Grund,  2012),  the  uni¬ 
form  distribution  affords  an  ideal  means  of  demonstrating  the 
intrusion  of  artificial  structures  into  tfds,  given  its  structureless 
morphology.  A  single  calibration  curve  is  considered  (IntCall3; 
Reimer  et  al.,  2013).  Finally,  a  single  value  for  instrument  mea¬ 
surement  error  was  considered  (50  14C  years),  approximating  the 
median  and  mean  standard  errors  for  the  set  of  Kodiak  dates 
illustrated  in  Fig.  2. 

For  each  sample  size  and  TFD,  2000  runs  were  generated  per  Eq. 
(2),  for  a  total  of  32,000  runs  processed.  With  a  maximum  sample 
size  of  2000  observations  per  run  and  two  TFDs  considered,  8 


million  individual  age  estimates  were  generated  in  total.  Each 
observation  was  mapped  onto  the  14C  timeline  and  offset  per  Eqs. 
(3)  and  (4),  then  calibrated.  All  observations  per  run  were  summed 
per  Eq.  ( 1 )  and  rescaled  so  that  the  area  under  each  spd  sums  to  1,  in 
other  words  such  that  each  spd  comprises  a  time  series  of  sample 
proportions.  The  inter-run  mean  sample  proportion  and  four  per¬ 
centiles  (2.5%,  25%,  75%,  and  97.5%)  were  calculated  at  five-year 
intervals  along  the  calendric  timeline. 


3.  Results 

Figs.  7  and  8  present  the  simulation  output  for  all  sixteen  sim¬ 
ulations  (eight  sample  sizes  per  model  TFD).  Four  patterns  emerge 
from  their  examination: 

First,  the  inter-run  mean  of  sample  proportions  closely  ap¬ 
proximates  the  underlying  TFD,  as  expected  given  the  LLN  and  the 
2000  runs  per  simulation.  However,  two  departures  from  this 
pattern  are  worth  noting,  one  at  each  end  of  the  model  TFD: 

•  Lower-than-expected  inter-run  means  are  indicated  toward  the 
earlier  end  of  each  distribution,  resulting  from  the  dispersive 
random  error  introduced  by  14C  age  translation  and  instrument 
offset.  In  effect,  this  dispersion  has  led  to  the  extension  of  spd 
sample  mass  beyond  the  sharply  defined  limit  of  the  underlying 
TFD. 

•  A  high  degree  of  random  error  in  the  inter-run  mean  of  sample 
proportions,  as  well  as  in  their  percentiles,  is  indicated  over  the 
last  500  years  of  each  distribution.  Williams  (2012)  noted 
similarly  high  variance  in  his  bootstrap  samples  for  the  same 
temporal  interval.  This  period  of  heightened  inter- run  variability 
is  likely  a  result  of  the  marked  increase  in  precision  of  the 
calibration  curve  over  this  interval  (Fig.  9),  which  would  act  not 
only  to  decrease  the  dispersion  evident  in  the  mapping  of 
calendric  into  14C  ages  (Eq.  (3))  but  also  to  increase  the  precision 
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Fig.  7.  Results  of  eight  MC  simulations,  showing  inter-run  variation  between  2000  identically  generated  spds  per  sample  size,  based  on  a  uniform  TFD  (black  line).  The  dotted  blue 
lines  show  the  2.5%  and  97.5%  boundaries,  while  the  light  blue  polygon  delineates  the  interquartile  range.  The  10-  and  100-year  moving  average  slopes  of  the  IntCall3  (Reimer  et  al., 
2013)  calibration  curve  are  shown  at  top  (light  and  dark  gray  curves,  respectively).  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to  the  web 
version  of  this  article.) 
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Fig.  8.  Results  of  eight  MC  simulations,  showing  inter-run  variation  between  2000  identically  generated  spds  per  sample  size,  based  on  a  truncated  exponential  TFD  (blue  line). 
Visual  specifications  are  as  in  Fig.  7.  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to  the  web  version  of  this  article.) 


of  calibrated  age  estimates'  probability  distributions  (as  in 

Fig.  6). 

Second,  the  magnitude  of  variation  between  runs  decreases  as 
sample  size  increases,  converging  toward  the  underlying  TFD,  as 
expected  given  the  LLN.  However,  for  small  and  even  moderate 
sample  sizes,  the  lower  percentile  boundaries  fall  at  or  hover 
slightly  above  0.  For  the  uniform  TFD  simulations,  the  lower  25% 
boundary  has  lifted  away  only  by  n  =  100  and  the  2.5%  boundary 
only  by  n  =  200.  A  similar  pattern  holds  for  the  exponential  TFD 
simulations,  although  the  earlier  end  of  the  curve  remains  low  in 
response  to  the  low  TFD  values  at  this  point.  Additionally,  the  inter¬ 
run  mean  falls  above  the  75%  boundary  near  the  early  end  of  the 
exponential  TFD  simulation  at  n  =  30. 

Third,  in  the  case  of  the  exponential  TFD  simulations,  the 
dispersion  of  sample  proportions  generally  increases  as  the  mass  of 
the  underlying  TFD  value  increases,  producing  the  curved  wedge 
shape  characterizing  the  percentile  boundaries  shown  in  Fig.  8.  A 
similar  pattern  should  be  expected  for  any  TFD  exhibiting  de¬ 
partures  from  uniform  shape  over  the  timeline.  Fig.  10  illustrates 
spds  for  two  simulation  runs  (n  =  200)  out  of  the  2000  illustrated  in 
Fig.  8.  Here  it  is  worth  noting  that,  while  both  spds  are  the  products 
of  an  identical  DGP,  their  inter-run  disparities  are  greatest  toward 
the  later,  higher-density  end  of  the  underlying  TFD. 

Fourth,  the  fluctuations  observed  in  the  percentile  estimators 
over  time  indicate  positional  stability  across  simulations,  this 
pattern  being  particularly  obvious  in  the  uniform  TFD  simulations 
(Fig.  7).  Furthermore,  these  fluctuations  appear  to  correspond  with 
fluctuations  in  the  average  calibration  curve  slope,  as  already  noted 
by  Michczyriska  and  Pazdur  (2004)  and  as  observed  in  Fig.  11.  On 
this  point,  a  revision  of  previous  statements  about  the  nature  of 
calibration  interference  is  warranted  (see  Bamforth  and  Grund, 
2012;  McFadgen  et  al.,  1994;  Williams,  2012).  Temporal  intervals 
characterized  by  steep  calibration  curve  slopes  do  not  always 
exhibit  anomalously  high  sample  proportions  but  rather  a  greater 
range  of  variability  between  spds,  on  both  sides  of  the  underlying 
TFD  value.  In  effect,  the  positional  stability  of  high-  and  low- 


variability  intervals  is  an  artifact  of  calibration  interference  (as  in 
Fig.  6),  whereas  their  direction  and  magnitude  constitute  random 
error  stemming  from  the  combined  operation  of  sampling  error  (as 
in  Fig.  3)  and  the  dispersive  transformation  of  true  calendric  ages 
into  measured  14C  age  estimates  (as  in  Fig.  4:A). 


4.  Discussion 

The  results  of  the  MC  simulations  presented  here  reinforce 
previous  simulation-based  impressions  (1)  that  the  nonlinear 
relationship  between  the  calendric  and  14C  timelines  introduces 
artificial,  time-dependent  structures  into  spds,  and  (2)  that  it  is 
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Fig.  9.  Standard  deviation  around  the  IntCall3  calibration  curve  (Reimer  et  al.,  2013), 
over  the  last  10,000  calendric  years. 
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Fig.  10.  A:  10-  and  100-year  moving  average  of  the  IntCall3  (Reimer  et  al.,  2013)  calibration  curve  slope  (light  and  dark  gray  curves,  respectively).  B-C:Two  MC  simulations 
(n  =  200),  identically  generated  from  a  truncated  exponential  distribution  as  in  Fig.  8.  2.5%  and  97.5%  inter-run  boundaries  from  Fig.  8  are  repeated  here  (dotted  blue  line)  for 
comparison.  Light  blue  vertical  bar  demarcates  the  timespan  850-750  cal  BP,  as  in  Fig.  1.  (For  interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to 
the  web  version  of  this  article.) 


difficult  to  distinguish  between  true  demographic  structures  and 
non-demographic  ones  in  small-sample  tfds,  including  spds.  These 
observations  recommend  against  any  attempt  to  address  hypoth¬ 
eses  about  short-run  demographic  processes  through  14C-sup- 
ported  dTFA,  particularly  when  working  with  modest  sample  sizes. 
Simply  put,  the  tfds  that  such  analyses  produce  are  too  sensitive  to 
the  various  DGPs  that  generate  them  to  validly  influence  our 
assessment  of  such  high-resolution  hypotheses.  The  disputed  de¬ 
mographic  dimension  of  the  Kachemak-Koniag  transition  provides 
a  case  in  point.  On  the  one  hand,  a  scarcity  of  site  occupations 
during  the  century  750-650  cal  BP  is  evident  in  tfds  based  on  what 
data  are  currently  available  (Figs.  1  and  2).  On  the  other  hand,  a 
comparison  of  Fig.  10:A  and  10: B  illustrates  that  even  when  two 
spds  are  generated  from  an  identical  DGP,  these  will  nevertheless 
exhibit  idiosyncratic  structures  relative  to  their  underlying  TFD  and 
disparate  structures  relative  to  each  other.  The  first  simulated  spd 
(Fig.  10:B)  exhibits  a  peak-trough-peak  sequence  over  the  interval 
850  and  550  cal  BP,  resembling  those  structures  observed  in  the 
Kodiak  spd  over  the  same  interval  (Fig.  2),  whereas  the  second 
simulation  (Fig.  10:C)  instead  exhibits  a  dramatic  increase  over  the 
same  interval.  Conversely,  the  underlying  TFD  shows  stable  and 
gradual  growth  over  the  same  interval,  proportional  to  a  population 
growing  at  a  constant  rate.  In  short,  assuming  for  the  sake  of 
argument  that  the  exponentially  increasing  TFD  in  Fig.  10  is 
representative  of  population  dynamics  in  Kodiak  over  the  period  of 
interest,  its  empirical  expression  should  nevertheless  be  expected 
to  exhibit  structures  consistent  with  alternative  accounts,  particu¬ 
larly  given  the  modest  sample  size  available  for  the  region.  In  light 


of  the  multiple  stochasticities  that  affect  small-sample  spds,  a 
suspension  of  judgment  regarding  the  demographic  dimension  of 
the  Kachemak-Koniag  transition  is  warranted,  reprising  Mills' 
(1994)  warning  of  two  decades  ago. 

Cautionary  tales  notwithstanding,  the  patterns  identified  in 
Section  3  can  also  be  used  to  inform  our  efforts  to  identify  effective 
strategies  for  mitigating  some  of  the  non-demographic  error,  both 
systematic  and  random,  that  troubles  spds.  To  begin  with,  the 
observation  that  variation  decreases  as  sample  size  increases  is 
neither  surprising  nor  novel.  In  the  realm  of  dTFA,  Chamberlain 
(2006,  2009)  and  Surovell  and  Brantingham  (2007)  have  already 
insisted  that  tfds  can  only  be  treated  as  robust  approximations  of 
their  underlying  TFDs  given  particularly  large  or  dense  samples. 
Even  so,  our  sense  of  what  qualifies  as  a  large  or  dense  sample  has 
not  yet  been  well-calibrated.  As  Michczynska  and  Pazdur  (2004) 
and  Williams  (2012)  have  warned,  and  as  the  above  simulations 
reconfirm,  marked  structural  disparities  between  pairs  of  spds 
identically  generated  from  the  same  TFD  can  be  quite  dramatic  for 
sample  sizes  less  than  200  to  500.  By  extension,  we  should  expect 
the  story  to  change  as  sample  size  increases.  Consequently,  we 
should  embrace  every  opportunity  to  increase  sample  size  as  a 
means  of  increasing  the  robustness  of  our  TFD  estimations,  when 
opportunities  exist  to  conduct  further  field  surveys  or  revisit 
museum  collections. 

While  the  relationship  between  spd  morphology  and  calibration 
curve  slope  has  long  been  recognized,  descriptions  of  this  sys¬ 
tematic  error  have  remained  incomplete  until  now.  Michczynska 
and  colleagues'  assertion  that  “the  influence  of  the  shape  of  the 
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Fig.  11.  A:  100-year  moving  average  of  the  IntCall3  (Reimer  et  al.,  2013)  calibration  curve  slope.  B:  Inter-run  interquartile  ranges  (IQRs)  for  large-sample  spds  (n  =  2000)  based  on 
the  uniform  distribution.  The  solid  horizontal  line  is  the  median  IQR  value.  Dashed  horizontal  lines  bound  the  mid-50%  range  of  IQRs.  Light  blue  bars  identify  those  temporal 
intervals  exhibiting  exceptionally  high  IQRs  (at  or  above  the  75%  boundary),  while  light  green  bars  identify  those  exhibiting  the  lowest  (at  or  below  the  25%  boundary).  (For 
interpretation  of  the  references  to  color  in  this  figure  legend,  the  reader  is  referred  to  the  web  version  of  this  article.) 


calibration  curve  on  the  95%  confidence  intervals  is  clearly  visible” 
(2004:  739)  lacks  specificity,  though  it  is  powerfully  illustrated  in 
their  Fig.  7.  Conversely,  those  statements  that  have  been  more 
specific  (see  Bamforth  and  Grund,  2012;  McFadgen  et  al.,  1994; 
Williams,  2012)  have  also  been  mistaken  in  asserting  that  steep 
calibration  curve  slopes  necessarily  produce  anomalously  exag¬ 
gerated  peak  structures  in  spds  whereas  gradual  slopes  and  pla¬ 
teaus  dampen  real  peaks.  We  can  now  revise  this  understanding, 
recognizing  that  calibration  interference  manifests  primarily  as  a 
time-dependent  fluctuation  in  inter-run  variance  and  that  other 
stochastic  processes  determine  whether  this  variance  manifests  as 
peaks  or  troughs.  By  implication,  we  should  have  greater  confi¬ 
dence  in  the  accuracy  of  spd  morphology  over  those  temporal  in¬ 
tervals  corresponding  with  low  calibration  curve  slopes  than  those 
corresponding  with  high.  For  the  range  of  time  considered  in  this 
paper,  Fig.  11  :B  identifies  several  periods  that  are  most  reliable 
(green  bars)  and  those  that  are  least  (blue  bars).  In  particular,  the 
latter  half  of  the  5th  millennium  cal  BP  and  the  five  centuries 
centering  on  2500  cal  BP  are  especially  reliable,  whereas  the  in¬ 
tervals  7500-7100;  6500-6200;  5600-5400;  2900-2750;  and 
especially  1500-500  cal  BP  show  the  greatest  inter- run  variance. 
The  last  500  years  BP  are  erratic,  this  being  a  function  of  noticeably 
low  variance  in  the  calibration  curve  over  this  terminal  period  (see 
Fig.  9).  That  the  Kachemak-Koniag  transition  transpired  during  a 
period  characterized  by  pronounced  inter-run  spd  variance  once 
again  recommends  that  we  suspend  judgment  regarding  the  de¬ 
mographic  information  that  our  dTFA  offers  for  this  interval. 

The  solution  to  calibration  interference  recommended  by 
Williams  (2012)  is  to  smooth  out  spds  with  a  moving  average,  using 
a  bandwidth  of  500  for  the  interval  0-11,000  cal  BP  or  of  800  for 
11,000-50,000  cal  BP  (see  also  Ozainne  et  al.,  2014;  Peros  et  al., 


2010;  Shennan  et  al.,  2013).  This  coarse-grained  approach, 
already  anticipated  in  McFadgen  and  colleagues’  (1994)  work,  re¬ 
verses  the  sorting  of  sample  density  that  results  from  calibration.  Of 
equal  importance,  although  unrecognized  by  Williams,  is  that  this 
smoothing  method  is  operationally  identical  to  kernel  density 
estimation,  a  nonparametric  method  for  estimating  the  probability 
density  functions  underlying  samples  (Baxter  and  Beardah,  1997; 
Sheather,  2004).  The  goal  of  kernel  density  estimation  is  to  ach¬ 
ieve  an  optimal  balance  between  the  reduction  of  sampling  error 
and  the  preservation  of  structures  original  to  the  underlying 
probability  distribution.  In  the  future,  the  MC  framework  presented 
here  will  be  used  to  pursue  optimized  algorithms  for  kernel  density 
estimation  in  spd- based  dTFA.  The  application  of  such  ‘objective’ 
algorithms  would  go  far  in  neutralizing  the  potential  for  analytical 
duplicity  (Whallon's  [1987]  ‘sleights  of  hand’).  Additionally, 
because  kernel  density  estimation  algorithms  tend  to  adapt  to 
sample  size  and  density,  their  application  would  lessen  the  need  to 
pursue  larger  samples,  though  at  the  expense  of  our  ability  to 
detect  fine-grained  structures  such  as  that  suggested  by  Maschner 
et  al.  (2009b). 

One  approach  to  dTFA  that  has  recently  emerged  is  the  mea¬ 
surement  of  proxy  growth  rate  estimates  from  tfd  curves  (Crema, 
2012;  Collard  et  al.,  2010;  Peros  et  al.,  2010),  analogous  to  the 
measurement  of  population  growth  from  N(t)  curves  (Preston  et  al., 
2001;  Skalski  et  al.,  2005).  Such  measurements  however  require 
relatively  precise  N(t)  estimates  at  two  points  along  the  curve, 
which  are  not  easy  to  come  by  for  paleopopulations.  When  N(t) 
estimates  for  paleopopulations  are  imprecise,  an  alternative 
approach  suggested  by  Cohen  (1995:  77-78)  is  to  calculate  a  range 
of  plausible  growth  rates  based  on  maxima  and  minima  for  the  two 
N(t)  values  bracketing  the  interval  of  interest,  if  such  information  is 


144 


W.A.  Brown  /  Journal  of  Archaeological  Science  53  (2015)  133—147 


available.  Crema's  (2012)  estimation  of  the  slope  dynamics  of  a  tfd 
comprising  mid-Holocene  Japanese  housepits  followed  this  general 
approach.  The  time-dependence  of  calibration  interference  once 
again  suggests  that  if  we  are  to  derive  growth  rate  estimates  from 
spds ,  the  most  accurate  estimates  will  be  for  those  intervals 
bracketed  by  ages  exhibiting  low  inter- run  variance  (i.e.,  those 
designated  by  green  bars  in  Fig.  11).  The  application  of  optimized 
KDEs  should  also  improve  the  accuracy  of  such  estimates. 

Finally,  while  the  simulations  presented  in  this  paper  demon¬ 
strate  that  the  four  DGPs  they  model  produce  structures  in  indi¬ 
vidual  spds  that  mimic  genuine  demographic  structures  with 
worrisome  regularity,  it  has  been  suggested  elsewhere  that  the 
identification  of  temporally  coincident  structures  between  tfds 
from  adjacent  study  regions  may  be  employed  to  cross-validate  the 
demographic  interpretation  of  such  structures  in  each  tfd  included 
in  the  comparison  (Hinz  et  al.,  2012;  Maschner  et  al.,  2009b:  48; 
Maschner,  personal  communication).  Validation  and  identification 
of  constraints  on  this  approach  await  future  exploration. 

By  design,  the  results  of  this  MC  experiment  do  not  explore  the 
influence  of  other  model  system  parameters  on  variability  between 
runs,  particularly  more  dispersed  TFDs  and  a  greater  or  more  var¬ 
iable  degree  of  instrument  measurement  error.  As  suggested  by 
Michczynska  and  Pazdur  (2004)  and  by  Williams  (2012),  variability 
in  both  of  these  parameters  may  warrant  an  adaptive  approach  to 
sample  size  threshold.  Alternatively,  an  adaptive  kernel  density 
estimation  algorithm  could  mitigate  the  effects  of  variability  in 
these  parameters.  Similarly,  the  truncated  uniform  and  exponential 
TFD s  explored  in  this  paper  were  chosen  for  their  structural 
simplicity,  enabling  the  straightforward  identification  of  the  non¬ 
demographic  structures  that  emerge  in  spds.  Inputting  more 
structurally  complex  TFDs  into  the  MC  simulation,  which  would 
emulate  population  size  curves  more  realistically  (see  Bamforth 
and  Grund,  2012;  Contreras  and  Meadows,  in  press;  Rhode  et  al., 
in  press),  would  allow  us  to  explore  in  greater  depth  the  problem  of 
equifinality  that  exists  between  real  demographic  structures  and 
impostors. 

5.  Conclusion 

The  MC  simulation  framework  presented  in  this  paper  is 
intended  to  build  on  the  precedent  of  these  past  studies,  first  by 
automating  the  simulation  algorithm  and  thereby  allowing  for 
larger  and  more  robust  simulations,  and  second  by  allowing  the 
researcher  to  experimentally  manipulate  the  many  parameters  that 
influence  tfds.  To  illustrate  the  power  of  this  approach,  I  chose  to 
evaluate  the  impact  of  sample  size  and  TFD  shape  on  spd  structure, 
holding  other  parameters  of  the  system  constant.  This  experiment 
reveals  a  more  nuanced  interaction  between  calibration  and  sto¬ 
chastic  processes  than  previously  articulated:  calibration  interfer¬ 
ence  expresses  itself  as  a  time-dependent  fluctuation  in  the  degree 
of  inter-run  variability.  The  stochastic  processes,  on  the  other  hand, 
determine  the  direction  and  degree  of  such  variability.  Several  so¬ 
lutions,  including  sample  size  increase,  a  focus  on  the  least  variable 
temporal  intervals,  and  kernel  density  estimation,  may  be  effective 
in  mitigating  these  problems,  although  the  last  solution  awaits 
further  exploration.  Future  work  is  also  warranted  to  explore  the 
impacts  of  other  parameters  of  the  model  system  (chiefly  instru¬ 
ment  error  and  TFD  scale  and  location)  on  the  structure  of  14C  spds. 
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